64 research outputs found

    Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

    Full text link
    In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.Comment: 8 pages, 2 figures, 6 tables. Published in Journal of Biomedical Informatic

    Guidelines for the monitoring of Rosalia alpina

    Get PDF
    Rosalia alpina (Linnaeus, 1758) is a large longhorn beetle (Coleoptera: Cerambycidae) which is protected by the Habitats Directive and which typically inhabits beech forests characterised by the presence of mature, dead (or moribund) and sun-exposed trees. A revision of the current knowledge on systematics, ecology and conservation of R. alpina is reported. The research was carried out as part of the LIFE MIPP project which aims to find a standard monitoring method for saproxylic beetles protected in Europe. For monitoring this species, different methods were tested and compared in two areas of the Apennines, utilising wild trees, logs and tripods (artificially built with beech woods), all potentially suitable for the reproduction of the species. Even if all methods succeeded in the survey of the target species, these results showed that the use of wild trees outperformed other methods. Indeed, the use of wild trees allowed more adults to be observed and required less intensive labour. However, monitoring the rosalia longicorn on wild trees has the main disadvantage that they can hardly be considered “standard sampling units”, as each tree may be differently attractive to adults. Our results demonstrated that the most important factors influencing the attraction of single trunks were wood volume, sun-exposure and decay stage. Based on the results obtained during the project LIFE MIPP, as well as on a literature review, a standard monitoring method for R. alpina was developed

    Towards cross-cohort estimation of cognitive decline in neurodegenerative diseases

    Get PDF
    International audienceHeterogeneity of cohorts, in terms of inclusion criteria, design of follow-up visits and batteries of cognitive assessments, hinders any thorough comparisons between them. For that reason, we build a cross-cohort model of cognitive decline that can be personalized to any patient, allowing to impute partially or totally missing scores. This enables to compare at an individual level disease progression of subjects from different cohorts, with a temporal realignment and regarding a broader set of biomarkers

    Using normative modelling to detect disease progression in mild cognitive impairment and Alzheimer’s disease in a cross-sectional multi-cohort study

    Get PDF
    Abstract Normative modelling is an emerging method for quantifying how individuals deviate from the healthy populational pattern. Several machine learning models have been implemented to develop normative models to investigate brain disorders, including regression, support vector machines and Gaussian process models. With the advance of deep learning technology, the use of deep neural networks has also been proposed. In this study, we assessed normative models based on deep autoencoders using structural neuroimaging data from patients with Alzheimer’s disease (n = 206) and mild cognitive impairment (n = 354). We first trained the autoencoder on an independent dataset (UK Biobank dataset) with 11,034 healthy controls. Then, we estimated how each patient deviated from this norm and established which brain regions were associated to this deviation. Finally, we compared the performance of our normative model against traditional classifiers. As expected, we found that patients exhibited deviations according to the severity of their clinical condition. The model identified medial temporal regions, including the hippocampus, and the ventricular system as critical regions for the calculation of the deviation score. Overall, the normative model had comparable cross-cohort generalizability to traditional classifiers. To promote open science, we are making all scripts and the trained models available to the wider research community

    Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application

    Full text link
    The introduction of computerized medical records in hospitals has reduced burdensome operations like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting them from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation, using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Large Language Model for this task. Moreover, we conducted several experiments with three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77%, Precision 83.16%, Recall 86.44%. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "few-shot" approach. This allowed us to establish methodological guidelines that pave the way for future implementations in this field and allow Italian hospitals to tap into important research opportunities

    Virtual brain simulations reveal network-specific parameters in neurodegenerative dementias

    Get PDF
    INTRODUCTION: Neural circuit alterations lay at the core of brain physiopathology, and yet are hard to unveil in living subjects. The Virtual Brain (TVB) modeling, by exploiting structural and functional magnetic resonance imaging (MRI), yields mesoscopic parameters of connectivity and synaptic transmission. METHODS: We used TVB to simulate brain networks, which are key for human brain function, in Alzheimer's disease (AD) and frontotemporal dementia (FTD) patients, whose connectivity and synaptic parameters remain largely unknown; we then compared them to healthy controls, to reveal novel in vivo pathological hallmarks. RESULTS: The pattern of simulated parameter differed between AD and FTD, shedding light on disease-specific alterations in brain networks. Individual subjects displayed subtle differences in network parameter patterns that significantly correlated with their individual neuropsychological, clinical, and pharmacological profiles. DISCUSSION: These TVB simulations, by informing about a new personalized set of networks parameters, open new perspectives for understanding dementias mechanisms and design personalized therapeutic approaches

    Multi-study validation of data-driven disease progression models to characterize evolution of biomarkers in Alzheimer's disease

    Get PDF
    Understanding the sequence of biological and clinical events along the course of Alzheimer's disease provides insights into dementia pathophysiology and can help participant selection in clinical trials. Our objective is to train two data-driven computational models for sequencing these events, the Event Based Model (EBM) and discriminative-EBM (DEBM), on the basis of well-characterized research data, then validate the trained models on subjects from clinical cohorts characterized by less-structured data-acquisition protocols. Seven independent data cohorts were considered totalling 2389 cognitively normal (CN), 1424 mild cognitive impairment (MCI) and 743 Alzheimer's disease (AD) patients. The Alzheimer's Disease Neuroimaging Initiative (ADNI) data set was used as training set for the constriction of disease models while a collection of multi-centric data cohorts was used as test set for validation. Cross-sectional information related to clinical, cognitive, imaging and cerebrospinal fluid (CSF) biomarkers was used. Event sequences obtained with EBM and DEBM showed differences in the ordering of single biomarkers but according to both the first biomarkers to become abnormal were those related to CSF, followed by cognitive scores, while structural imaging showed significant volumetric decreases at later stages of the disease progression. Staging of test set subjects based on sequences obtained with both models showed good linear correlat
    corecore